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Abstract. We consider infinite duration alternating move games. These 
games were previously studied by Roth, Balcan, Kalai and Mansour [10] . 
They presented an FPTAS for computing an approximated equilibrium, 
and conjectured that there is a polynomial algorithm for finding an exact 
equilibrium [9]. We extend their study in two directions: (1) We show that 
finding an exact equilibrium, even for two-player zero-sum games, is poly- 
nomial time equivalent to finding a winning strategy for a (two-player) 
mean-payoff game on graphs. The existence of a polynomial algorithm 
for the latter is a long standing open question in computer science. Our 
hardness result for two-player games suggests that two-player alternat- 
ing move games are harder to solve than two-player simultaneous move 
games, while the work of Roth et al., suggests that for k > 3, k- player 
games are easier to analyze in the alternating move setting. (2) We show 
that optimal equilibriums (with respect to the social welfare metric) can 
be obtained by pure strategies, and we present an FPTAS for computing 
a pure approximated equilibrium that is <5-optimal with respect to the 
social welfare metric. This result extends the previous work by presenting 
an FPTAS that finds a much more desirable approximated equilibrium. 
We also show that if there is a polynomial algorithm for mean-payoff 
games on graphs, then there is a polynomial algorithm that computes an 
optimal exact equilibrium, and hence, (two-player) mean-payoff games 
on graphs are inter-reducible with fc-player alternating move games, for 
any k > 2. 



1 Introduction 

In this work, we investigate infinitely repeated games in which players alternate 
making moves. This framework can model, for example, five telecommunication 
providers competing for customers: each company can observe the price that is 
set by the others, and it can update the price at any time. In the short term, 
each company can benefit from undercutting its opponents price, but since the 
game is repeated indefinitely, in some settings, it might be better to coordinate 
prices with the other companies. Such examples motivate us to study equilibria 
in alternating move games. 

In this work, we study infinitely repeated fc-player n-action games. In such 
games, in every round, a player chooses an action, and the utility of each player 
(for the current round) is determined according to the fe-tuple of actions of the 
players. Each player goal is to maximize his own long-run average utility as the 
number of rounds tends to infinity. 



These games were studied by Roth et al. in (TUJ, and they showed an FPTAS 
for computing an e-equilibrium. Their result provided a theoretical separation 
between the alternating move model and the simultaneous move model, since 
for the latter, it is known that an FPTAS for computing approximate equilibria 
does not exists for games with k > 3 players unless P=PPAD. Their result was 
obtained by a simple reduction to mean-payoff games on graphs. These games 
were presented in [5] , and they play an important rule in automata theory and 
in economics. The computational complexity of finding an exact equilibrium for 
such games is a long standing open problem, and despite many efforts [IH3j|6l[7l 
I12j . there is no known polynomial solution for this problem. 

We extend the work in [10] by investigating the complexity of an exact equi- 
librium (which was stated as an open question in |10j). and by investigating 
the computational complexity of finding an 5-optimal approximated equilibrium 
with respect to the social welfare metric. Our main technical results are as fol- 
lows: 

— We show a reduction from mean-payoff games on graphs to two-player zero- 
sum alternating move games, and thus we prove that fc-player alternating 
move games are computationally equivalent to mean-payoff games on graphs 
for any k > 2. 

— We show that optimal equilibrium can be obtained by pure strategies, and 
we show an FPTAS for computing an (5-optimal e-equilibrium. In addition, 
we show that computing an exact optimal equilibrium is polynomial time 
equivalent to solving mean-payoff games on graphs. 

We note that the first result may suggest that two-player alternating move games 
are harder than two- player simultaneous move games, since a polynomial time 
algorithm to solve the latter is known [5]. Hence, along with the result of [TU] . 
we get that simultaneous move games are easier to solve with comparison to 
alternating move games for two-player games, and are harder to solve for k > 3 
player games. 

This paper is organized as following. In the next section we bring formal def- 
initions for alternating move games and mean-payoff games on graphs. In Sec- 
tion [21 we show that alternating move games are at least as hard as mean-payoff 
games on graphs. In Section H we investigate the properties of optimal equilib- 
riums, and we present an FPTAS for computing an (5-optimal e-equilibrium. 

2 Definitions 

In this section we bring the formal definitions for alternating move repeated 
games and mean-payoff games on graphs. Alternating move games are presented 
in Subsection 12.11 and mean-payoff games are presented in Subsection 12.21 

1 Although, to the best of our knowledge, there is no hardness conjecture for mean- 
payoff games. 



2.1 Alternating move repeated games 

Actions, plays and utility function. A fc-player n-action game is defined 
by an action set Ai for every player i, and by k utility functions, one for each 
player, u% : A\ x . . . Ak — > [—1,1]. W.l.o.g we assume that the size of all action 
set is the same, and we denote it by n. We note that any game can be rescaled 
so its utilities are bounded in [—1,1], however, the FPTAS that was presented 
in [10] , and our results in Subsection 14.41 crucially rely on the assumption that 
the utilities are in the interval [—1, 1]. 

An alternating move game is played for infinitely many rounds. In round t 
playerj = l+(t mod k) plays action ah, and a vector of actions a* = (ax,...,a,k) 
is produced, where £ Ai is the last action of player i. In every round t, player 
i receives a utility Ui(a l ), which depends only in the last action of each of the 
k players (W.l.o.g the utility in the first k rounds is zero for all players). A 
sequence of infinite rounds forms a play, and we characterize a play either by 
an infinite sequence of actions or by the corresponding sequence of vectors of 
actions. The utility of player i in a play a 1 a 2 . . . a*a t+1 ... is the limit average 
payoff, namely, lhrin^oo — 2™=i u i( flt )- When this limit does not exist, we define 
the utility of the play for player i to be liminfn^oo i Ylt=i M i( a *)- We note that 
in the frame work of [10] . the utility of a play was undefined when the limit 
does not exist. The results we present in this paper for the liminf metric holds 
also for the framework of [10]. On the other hand, if we would take the limsup 
value instead, then the problem is much easier, and we can easily produce a 
polynomial algorithm to solve these games. 

Strategies. A strategy is a recipe for player's next action, based on the entire 
history of previous actions. Formally, a (mixed) strategy for player i is a func- 
tion Oi : (Ai x • • • x Ak)* x Ai x — > A(Ai), where A(S) denotes the set 
probability distribution over any finite set S. We say that Oi is a pure strategy 
if A is a degenerated distribution. A strategy profile is a vector a = (a\, . . . , <7fc) 
that defines a strategy for every player. A profile of pure strategies uniquely 
determines the action vector in every round and yields a utility vector for the 
players. A profile of mixed strategies determines, for every round t in the play, 
a distribution of sequences of action vectors, and the average payoff in round t 
is the expected average payoff over the distribution of action vectors. Formally, 
for a strategy profile a we denote the average payoff of player i in round t by 

p f \ j? r Mi(a 1 ) + --- + 7i t (a*) 1 

and the utility of player i is liminf t^<x> P%,t- 

Equilibria, e equilibria and optimal equilibria A strategy profile forms an 
equilibrium if none of the players can strictly improve his utility (that is induced 
by the profile) by unilaterally deviates from his strategy (that is defined by the 
profile). For every e > 0, we say that a strategy profile forms an e-equilibrium 
if none of the players can improve his utility by more than e by unilaterally 
deviates from his strategy. 



The social welfare of a strategy profile is the sum of the utilities of all players. 
An equilibrium (resp. an e-equilibrium) is called an optimal equilibrium (opti- 
mal e-equilibrium) if its social welfare is not smaller than the social welfare of 
any other equilibrium (e-equilibrium). For 5 > 0, an equilibrium (resp. an e- 
equilibrium) is called an 5 -optimal equilibrium if its social welfare is not smaller 
by more than S with comparison to the social welfare of any other equilibrium 
(e-equilibrium). 

2.2 Mean-payoff games on graphs 

Plays and payoffs. A mean-payoff game on a graph is defined by a weighted 
directed bipartite graph G = (V = V\ U V2, E, w : E — > Q) and an initial vertex 
v € V. The game consists of two players, namely, maximizer (who owns Vi) and 
minimizer (who owns V2). Initially, a pebble is place on the initial vertex, and in 
every round, the player who owns the vertex in which the pebble resides, advance 
the pebble into an adjacent vertex. This process is repeated forever and forms a 
play. A play is characterized by a sequence of edges, and the average payoff of a 
play p = ei . . . e t up to round t is denoted by P t — \ ^ i=1 w(ei). The value of a 
play is the limit average payoff (mean-payoff), namely, liminft^oo P t . (We note 
that for games on graphs, the limsup metric gives the same complexity results.) 
The objective of the maximizer is to maximize the mean-payoff of a play, and 
the minimizer aims to minimize the mean-payoff. 

Strategies, memoryless strategies, optimal strategies and winning strate- 
gies. In this work, we consider only pure strategies for games on graphs, and 
it is well-known that randomization does not give better strategies for mean- 
payoff games. A strategy for maximizer is a function a : (Vi x V2)* x V\ —*E 
that decides the next move, and similarly, for the minimizer a strategy is a func- 
tion t : (Vi x V2)* — >• E. A strategy is called memoryless if it depends only 
on the current position of the pebble. Formally, a memoryless strategy for the 
maximizer is a function a : V\ — > E and similarly a memoryless strategy for the 
minimizer is a function r : V2 — > E. 

A profile of strategies (er, r) uniquely determines the mean-payoff value of 
a game. We say that a play ir = e\e^ . . . e„ . . . is consistent with a maximizer 
strategy a if there exists a minimizer strategy r such that ir is formed by (cr, r). 
We say that the value of a maximizer strategy is p if it can assure a value of 
at least p against any minimizer strategy. Analogously, we say that the value 
of a minimizer strategy is p if it can assure a value of at most p against any 
maximizer strategy. 

We say that a maximizer strategy is optimal if its value is maximal (with 
respect to all possible maximizer strategies). Analogously, a minimizer strategy 
is optimal if its value is minimal. For a given threshold, we say that a maximizer 
strategy is a winning strategy if it assures mean-payoff value that is greater or 
equal to the given threshold, and a minimizer strategy is winning if it assures 
value that is strictly smaller than the given threshold. 

One-player games, and games according to memoryless strategies A 

special (and easier) case of games on graphs is when the out-degree is one for all 



the vertices that are owned by a certain player. In this case, all the choices are 
done by one player. For a two-player game on graph G, and a player-1 strategy 
<7, we define the one-player game graph G a to be the game graph that is formed 
by removing, for every player-1 vertex v, the out-edges that are not equal to 
a(v). 

Classical results on mean-payoff games. Mean-payoff games were intro- 
duced in '79 by Ehrenfeucht and Mycielski [5], and their main result was that 
optimal strategies (for both players) exist, and moreover, the optimal value can 
be obtained by a memoryless strategy. The decision problem for mean-payoff 
games is to determine if the maximizer has a winning strategy with respect to 
a given threshold. The existence of optimal memoryless strategies almost imme- 
diately proves that the decision problem for mean-payoff games is in NPflcoNP, 
and thus it is unlikely to be NP-hard (or coNP-hard). Zwick and Paterson [T2] 
introduced the first pseudo-polynomial algorithm, which runs in polynomial time 
when the weights of the edges are encoded in unary. They also provided a poly- 
nomial algorithm for the special case of one-player mean-payoff games. A ran- 
domized sub-exponential algorithm for mean-payoff games is also known [3] , but 
despite many efforts, the existence of a polynomial algorithm to solve mean- 
payoff games remains an open question, and it is one of the rare problems in 
computer science that is known to be in NPHcoNP but no polynomial algorithm 
is known. 

We summarize the known results on mean-payoff games in the next theorems. 
The first theorem states that optimal strategies exist and moreover, there exist 
optimal strategies that are memoryless. 

Theorem 1 ( [5]) For every mean-payoff game there exists a maximizer mem- 
oryless strategy a and a minimizer memoryless strategy t such that a is optimal 
for the maximizer and r is optimal for the minimizer. 

The next theorem shows that there is a polynomial algorithm that computes 
optimal strategies if and only if there is a polynomial algorithm for the mean- 
payoff games decision problem. 

Theorem 2 ( |12J ) The following statements are equivalent: 

— There exists a polynomial algorithm that determines if maximizer has a win- 
ning strategy with respect to a given threshold. 

— There exists a polynomial algorithm that determines if maximizer has a win- 
ning strategy with respect to threshold zero. 

— There exists a polynomial algorithm that computes the optimal value that the 
maximizer can assure. 

— There exists a polynomial algorithm that determines if the maximizer optimal 
strategy assures strictly positive value. 

— There exists a polynomial algorithm that computes a memoryless optimal 
strategy for maximizer. 



3 Two-Player Zero-Sum (Alternating Move) Games are 
Inter-Reducible with Mean-Payoff Games 



In this section we prove that there is a polynomial algorithm that computes an 
exact equilibrium for two-player zero-sum (alternating move) games if and only 
if there exists a polynomial algorithm that solves mean-payoff games. 

The reduction from two-player zero-sum games to mean-payoff games is triv- 
ial, for a two-player zero-sum game with actions A\, A 2 and utility functions 
ui : A\ x A 2 — > Q and u 2 = — iti, we construct a complete bipartite game graph 
G = (V = AiUA 2 ,E = (Ai x A 2 )L)(A 2 x A x ),w : E -> <Q>) such that the weight 
of the transition from a\ <E A\ to a 2 £ A 2 is simply Ux(a%, a 2 ), and the weight of 
the transition from a 2 to a\ is also ui(a\,a 2 ). It is a simple observation that a 
pair of optimal strategies (for the maximizer and minimizer) in the mean-payoff 
game induces an equilibrium strategy profile in the two-player zero-sum game 
and vice versa. 

The reduction for the converse direction is more complicated. For this purpose 
we bring the notion of undirected game graph. A mean-payoff game graph is said 
to be undirected if its edge relation is symmetric, and w(v\,v 2 ) — w(v 2l v\) for 
every edge (v\,v 2 ). (Basically, it is a game on an undirected graph.) The next 
simple lemma shows a reduction from mean-payoff games on a complete bipartite 
undirected graphs to two-player zero-sum game. 

Lemma 1 There is a polynomial reduction from mean-payoff games on complete 
bipartite undirected graphs to two-player zero-sum games. 

Proof. The proof is straight forward. Let Vi and V 2 be the maximizer and mini- 
mizer (resp.) vertices in the mean-payoff game. We construct a two-player zero- 
sum game in the following way. The set of action of player 1 is A% = V 2 and 
the set of action for player 2 is A% = V\. We denote by W the least value for 
which all the weights in the undirected graphs are in [-W, +W], and the utility 
function of player I is ui(ai,a 2 ) = w ( a ^ a2 "> = w ( a 2^ a i) ^ anc [ U2 _ j t j g 

trivial to observe that an equilibrium profile induces a pair of optimal strategies 
for the mean-payoff game, and the proof follows. □ 

Due to Lemma [TJ all that is left is to prove that mean-payoff games on 
complete bipartite undirected graphs are equivalent to mean-payoff games. A 
recent result by Chatterjee, Henzinger, Krinninger and Nanongkai [1] gives us 
the first step towards such proof. 

Theorem 3 (Corollary 24 in |4j) Solving mean-payoff games on complete bi- 
partite (directed) graphs is as hard as solving mean-payoff games on arbitrary 
graphs. 

We use the above result as a black box and extend it to complete bipartite 
undirected graphs. We note that the main difference between directed and undi- 
rected graphs is that for undirected graphs the weight function is symmetric. 
In the rest of this section we will describe a process that for a given complete 



bipartite directed graph, generates a suitable symmetric weight function, and 
the winner in the generated graph is the same as in the original graph. 

We say that a directed game graph has a normalized weight function if it 
assigns a positive weight to every out-edge of maximizer vertex, and a negative 
weight for every out-edge of minimizer vertex. The next lemma shows that we 
may assume w.l.o.g that a directed game graph has a normalized weight function. 

Lemma 2 Solving mean-payoff games on (directed) bipartite graphs is polyno- 
mial time inter-reducible to solving mean-payoff games on (directed) bipartite 
graphs with normalized weights. 

Proof. Let G be a non-normalized graph and let us denote by W the heaviest 
weight (in absolute value) that is assigned by its weight function w. We construct 
a normalized graph G' from G by defining a weight function to' as: 



Clearly, G' is a normalized graph, and since G' and G are bipartite, it is straight 
forward to observe that for any finite path in tt we have that \w(n) — w'(ir)\ < 
W + l, and thus, for every infinite path p, we have that that mean-payoff value of 
p according to w is identical to the mean-payoff of p according to to'. Therefore, 
a maximizer winning (resp. optimal) strategy in graph G is a winning (optimal) 
strategy also in G' and vice versa, and the proof of the lemma follows. □ 

In the next lemma we show that mean-payoff games on direct normalized 
bipartite complete graphs are as hard as mean-payoff games on undirected nor- 
malized bipartite complete graphs. 

Lemma 3 The problem of determining whether maximizer has a winning strat- 
egy for a threshold for a mean-payoff games on a directed normalized bipartite 
complete graph is as hard as the corresponding problem for mean-payoff games 
on an undirected normalized bipartite complete graph 

Proof. We show that for a given directed normalized bipartite complete graph 
G = (V, E, w) we can construct (in polynomial time) an undirected normalized 
bipartite complete graph G' = (V, E', w') such that maximizer has a winning 
strategy in G (for threshold 0) if and only if he has a winning strategy in G' 
(again for threshold 0). Informally, we build G' from G by taking the same set 
of vertices and by assigning a weight w'(u,v) that matches either to w(u, v) or 
to w(v, u). 

To formally construct w 1 we present the notion of surely loosing edges. Recall 
that V\ are maximizer vertices and V2 are minimizer vertices. We say that a 
maximizer vertex out-edge (vi, V2) is surely loosing for maximizer if w(v\, v 2 ) + 
w(v2,vi) < 0. We claim that if maximizer has a winning strategy, then all 
its memoryless winning strategies do not contain surely loosing edges. That is, 
for every memoryless winning strategy a and a surely loosing edge (v\,V2) we 




w(u, v) + (W + 1) if u is owned by maximizer 

w(u, v) — (W + 1) otherwise (if u is owned by the minimizer) 



have a{v\) ^ (vi,i>2)- Indeed, towards contradiction let us assume that <j{v\) = 
{v\,V2), an d recall that G is a complete bipartite graph, than for the minimizer 
strategy r that leads from every minimizer vertex to v\ we have that the mean- 
payoff value of (a, r) is Hia^aliHlgaiHiJ < q, in contradiction to the assumption 
that (j is a winning strategy. Analogously, (u2,fi) is a surely loosing edge for 
minimizer if w{vi, V2) + w(v2,vx) > and by the same arguments there is no 
memoryless winning strategy for minimizer that contains a surely loosing edge. 

We are now ready to formally define G — (V , E' ,w'). We define V = V, 
and since G is complete bipartite the definition of E 1 follows immediately. For 
an edge {vi, V2} £ E' , where Vi £ we define 

w'( Vl V2 ) = { w ( Vl ' V2 } ^w{v 1 ,v 2 )+w{v 2 ,v 1 )>Q 

' \w{v2,v\) otherwise (if w(vi, v%) + w (v%, v\) < 0) 

We first prove that if a is a memoryless winning strategy for maximizer in 
G, then it is also a winning strategy in G . Indeed, let p be a path (in G and 
in G) that is consistent with a. We claim that for every path tt that is a finite 
prefix of p we have w(ir) < w'(tt) (that is, its weight in G is not less than 
its weight in G), and we prove it by a simple induction on the length of tt. 
For \tt\ = the claim is trivial. For \tt\ > let (u,v) be the last edge in tt. 
If it € Vi, then we get that <r(u) = (u,v) and therefore (u,v) is not a surely 
loosing edge. Therefore, by definition, w(u,v) = w'(u,v) and the claim follows 
by the induction hypothesis (since it holds for the prefix of length \ir\ — 1). If 
u G V2, then since G is normalized we have that w(u,v) < and w(v,u) > 0. 
Therefore, by definition, w'(u,v) > w(u, v), and by the induction hypothesis the 
claim follows. Since a is a winning strategy we get that the mean-payoff value 
of p is non- negative according to w, and by the last claim we get that the mean- 
payoff of p according to w' is greater or equal to the mean-payoff value according 
to w. Hence a is a winning strategy for maximizer also in G . 

By the same arguments we get that if minimizer has a memoryless winning 
strategy in G, then the same strategy is winning for minimizer also in G . 

Finally, due to Theorem [TJ we get that maximizer has a winning strategy in 
G if and only if he has a memoryless winning strategy in G if and only if he has 
a memoryless winning strategy in G if and only if it has a winning strategy in 
G and the proof of the lemma follows. □ 

To conclude, by Lemma Q] we get that a polynomial algorithm for alternating 
move two-player zero-sum games exists if and only if there exists a polynomial 
algorithm for solving mean-payoff games on a complete bipartite undirected 
graph, and by Lemmas [2l [T] and [3] and by Theorem [3] we get that the latter 
exists if and only if there exists a polynomial algorithm for solving mean-payoff 
games on arbitrary (directed) graphs. Hence, the main result of this section 
follows. 

Theorem 4 There exists a polynomial time algorithm for computing exact equi- 
librium for two-player zero-sum (alternating move) games if and only if there 
exists a polynomial time algorithm for solving mean-payoff games on graphs. 



4 Complexity of Computing Optimal Equilibrium 



In this section, we investigate the complexity of computing an optimal equilib- 
rium. Our main results are summarized in the next theorem: 

Theorem 5 1. Optimal equilibrium can be obtained by a profile of pure strate- 
gies. 

2. If mean-payoff games are in P, then there is a polynomial algorithm for 
computing an exact optimal equilibrium. 

3. If mean-payoff games are not in P, then there is no FPTAS that approximate 
the social welfare of the optimal equilibrium. 

4- There is an FPTAS to compute an e- equilibrium that is 5-optimal. (Note that 
it does not necessarily approximate the value of the optimal social welfare.) 

We will prove Theorem [5] in the next four subsections: In Subsection 14.11 we 
show the naive algorithm for computing an equilibrium that is based on Folk 
Theorem, and we prove basic properties of equilibriums in alternating move 
games. In Subsection 14.21 we prove Theorem [5jl) . In Subsection 14.31 we investi- 
gate the complexity of computing the social welfare of the optimal equilibrium, 
and prove Theorem [Sf 2) and Theorem [5^3) . Finally, in Subsection 14.41 we prove 
Theorem [5J4) which is the main result of this section. 

In this section, we will model n-action /c-player alternating move games by 
a multi-weighted graph, according to the following conventions: The vertices of 
the graph are the vertices in the set V = (A\ x Ai x • • • x Ak) x {1, . . . , fc}, and we 
say that player i owns the vertex set Vi = (A\ x A^ x • • • x Ak) x {i}. Intuitively, 
a vertex is characterized by an action vector and by a player that owns it. The 
pair (u, v) is in the edge relation if u is owned by player i, v is owned by player 
i + 1 (where player k + 1 is player 1), and there is at most one difference in the 
action vector of u and v and it is in position i. The weight of every edge is a 
vector of size k that corresponds to the utility vector of the actions. Formally, if 
u = and v = (a^,i + 1) then w(u,v) = (ui(ai'), u 2 (a^), . ■ . ,Uk(a^)). 

For an infinite path in the multi-weighted graph we define the dimension 
i of mean-payoff vector of the path to be the mean-payoff value of the path 
according to dimension i. It is an easy observation that every infinite path in 
the graph corresponds to a play and its mean-payoff vector corresponds to the 
utility vector of the play. We note that the size of the graph is k 2 ■ n h which is 
polynomial in the size of the encoding of the utility functions (which is k ■ n k ), 
hence this graph can be constructed in polynomial time. 

4.1 Basic properties of equilibriums 

The Folk Theorem gives a conceptually simple (but inefficient) technique to 
construct an equilibrium. Intuitively, an equilibrium is obtained when each of 
the players play as if the goal of all the other players is to minimize its utility, and 
if one of the players deviates from this strategy, then all the other players switch 
to playing according to a strategy that will minimize the utility of the rebellious 



player. Formally, let G be the corresponding fc-player game graph that models 
the alternating move game. For every player i, we consider a zero-sum two-player 
mean-payoff game graph G l in which the maximizer owns player i vertices and 
the minimizer owns the other vertices. Let o~i be an arbitrary optimal strategy 
for the maximizer in graph G % , let o7 be an arbitrary optimal strategy for the 
minimizer in G l , and let i>i be the value that is obtained by the strategy profile 
(crj,CTi). Then if every player i plays according to the strategy: 

If player j ^ i deviated from tjj, then play according to WJ forever, and 
otherwise play according to o~i 

an equilibrium is formed (since by definition, playing according to o~i assures 
utility at least u^, and deviating from er^ assures utility at most V\). 

In the next lemma, we can extend the basic principle of Folk Theorem and 
get a characterization of all the equilibriums that are obtained by a profile of 
pure strategies. 

Lemma 4 Let (t\, t^, • ■ • , £fc) be a utility vector such that ti > Vi (for every 
player i ), then there exists a pure equilibrium with utility exactly ti for every 
player i if and only if there exists an infinite path it in the graph G with mean- 
payoff vector (£1, *a» •••,**;)• 

Proof. The direction from left to right is trivial, since a profile of pure strategies 
has a (unique) infinite path in the graph that is consistent with the strategies. 
To prove the converse direction we introduce the notion of a path strategy and 
the notion of path equilibrium. For a path n we define the path strategy for player 
i to be: at round j (which is a player i round) , play according to the j-th edge in 
7r. And we define the strategy of to be: If player j ^ i deviated from the path 
strategy of ir, then play forever according to a], and otherwise play according to 
the path strategy of 7r. The path equilibrium of the infinite path tt is the profile 
{a 1, . . . , er£). The profile is an equilibrium since it assures a utility U > Vi for 
every player, and if player i deviates from the strategy cf he will end up with a 
utility Vi. □ 

4.2 Optimal equilibrium can be obtained by pure strategies 

In this subsection, we extend Lemma HI also for the case of mixed strategies, 
and as a consequence we get that optimal equilibrium can be obtained by pure 
strategies. Intuitively, we wish to show that if a profile of (mixed) strategies yields 
a utility vector (ti, . . . ,£&), then there exists an infinite path in the graph with 
mean-payoff vector that is greater or equal (in every dimension) to (£1,. . •,£&)• 
Then we get that if a utility vector is obtained by a profile of mixed strategies, 
and then by Lemma HI it is also obtained by a profile of pure strategies. 
We formally prove the above by the next two lemmas. 

Lemma 5 Let G be a multi-weighted graph that is strongly connected, and let 
(£x,...,£fc) be a vector. Then if for every a > there exists a (finite) cyclic 
path with average weight at least ti — a in every dimension, then there exists an 
infinite path with mean-payoff vector at least (£1, . . . ,£fc). 



Proof. We assume that for every a > there exists a finite cyclic path C a with 
average weight at least U — a in every dimension, and we let v a be an arbitrary 
vertex in the cycle C a . For every two vertices u and v, we denote by 7r Ujt , the 
shortest path from u to v (recall that G is strongly connected), and we denote by 
W the size of the biggest weight (in absolute value) in G. Intuitively, we obtain 
an infinite path with mean-payoff vector at least (ii, . . . ,t n ) by following the 
cycle C a for a = 1, and then we follow the path ir v and we follow the cycle 

' 2 

Cs. twice, and then we follow C» three times and so on. However, we have to 

2 ' 3 ' 

make sure that the average payoff does not decrease too much when following a 
cycle for the first time. 

Formally, for every i 6 N, we denote by Li the length of the cycle Ci , and by 
rrii — iWLi + \. We define po to be the empty path, and for every i > we define 
Pi = Pi-i^v i ,vi Cj™ 1 j and we define p to be the infinite path that is the limit 

of the sequence po , p\ , . . . , p, , Due to the fact that the length of ir v 1 jVl is 

bounded (by the size of the graph), and since the maximal weight is at most W 
(and the minimal weight is at least — W), we get, by a simple algebra, that the 
mean-payoff vector of p is at least (t\, . . . , tk)- □ 

Lemma 6 Let a be a profile of (mixed) strategies with utility vector (t\, . . . , tfc). 
Then for every a > there exists a cyclic path in the game graph with average 
weight at least ti — a in every dimension. 

Proof. By definition of the utility function, for every 5 > 0, there exists a round 
j such that the expected average utility, in every round after j , is at least U — 5 in 
every dimension. We denote by LTj the (finite) set of paths with length j that have 
non-zero probability according to the strategy profile cr, and w.l.o.g we assume 
that all the paths have the same probability (and a path may occur more than 
once in II j ) . For a path it <E LTj, we denote by CV the longest cyclic path that is 
a sub-path of tt. We note that since G is a finite graph, we get that \C^ \ > \n\ — 
2\G\ (where \G\ denotes the number of vertices in G), and in every dimension: 
w(C v ) > w(w)-2\G\W a ndAvg(C v ) > Avg{*)- - Avg(ir)-J^. 

We partition II j to \G\ sets (some of them may be empty) namely i7J for every 
v G G, such that for every path tt e LIJ we have that v is in the cyclic path C v . 
For a set LIJ — {ni, . . . , 7r m }, we denote C(7TJ) = C 7I1 C 7T2 . . . C^ m (note that 
C(LTj) is a path). For every two vertices u, v £ G, we denote by ir UtV the shortest 
path between u and v, and we note that \ir UiV \ < \G\ and w(tt UjV ) > — \G\W in 
every dimension. Finally, we assume that the vertex set of G is V = {v\, . . . , v m } 
and we define the cyclic path 

tt = c(ny i )ir VliV2 c(ny 2 )ir V2 , V3 . . ■Trv m - 1 ,v m c(nj m )ir VmtVl 

The average weight of 7r in every dimension is at least 

Avg(n 3 ) - 



and since in every dimension Avg(JJj) > i$ — 5, then for j > 3 ^ G ] W we get that 
in every dimension 

Avg(ir) >U-25 

and for 5 = ^ we get that it is a cyclic path with average weight at least U — a 
in every dimension, and the proof of the lemma follows. □ 

We are now ready to prove that the utility vector of a mixed equilibrium can be 
obtained by a pure equilibrium. 

Proposition 1 Let a be a profile of mixed strategies that induces a utility vector 
(t\, ... ,tk). Then there exists a profile a' of pure strategies that induces exactly 
the same utility vector. Moreover, if a is an equilibrium, then so is a'. 

Proof. By Lemma [5] we get that for every a > there is a cyclic path with 
average weight at least ti — a in every dimension. Therefore, by Lemma [5] we get 
that there is an infinite path in G with mean-payoff vector at least (t\, . . . 
and by Lemma [4] we get that there is a profile of pure strategies that has utility 
at least (t\, . . . , tk). If o is an equilibrium we get that ti > v$ (since otherwise, 
player i would deviate to strategy a), and thus, by Lemma HI we get that there 
is a pure equilibrium that gives the same utility vector. □ 

The next corollary immediately follows from Proposition [T] 

Corollary 1 (Theorem [5](1)) An optimal equilibrium can be obtained by a 
profile of pure strategies. 

4.3 The complexity of computing the social welfare of the optimal 
(exact) equilibrium 

In this section we show that if there is a polynomial algorithm for mean-payoff 
games, then there is a polynomial algorithm to compute an optimal equilibrium 
in a fc-player alternating move games. We also prove the converse direction, that 
is, we show that if there is a polynomial algorithm that computes the social 
welfare of the optimal equilibrium, then there is a polynomial algorithm that 
solves mean-payoff games. We prove these two assertions in the next two lemmas. 

Lemma 7 Suppose that mean-payoff games are in P, then there is a polynomial 
algorithm that computes the social welfare of an optimal equilibrium. 

Proof. Due to Corollary [TJ it is enough to consider only pure strategies, and due 
to Lemma 0] a vector of utilities (t\, . . . , tk) is obtained by a pure equilibrium if 
and only if U > V{ (for i = 1, . . . , k) and there is an infinite path in the game 
graph with mean-payoff (t\, . . . , tk). Since we assume that there is a polynomial 
algorithm for computing V{, our problem boils down to 

Find the maximal value of ti subject to 
— U>Vi] and 



— there exists an infinite path with mean-payoff vector at least (t\ , 



It was shown in [TT] (in the proof of Theorem 18) that the problem of deciding 
whether there exists an infinite path with mean-payoff vector at least (ti, . . . , tfc) 
can be reduced (in polynomial time) to a set of linear constraints. Moreover, the 
generated set of constraints remain linear even when U is a variable. Hence, we 
can find a feasible threshold vector (ti, . . . , tk) (that is, a vector that is realizable 
by an infinite path in the graph) that maximizes Yli=x ^» by linear programming. 
Therefore, if we have a polynomial algorithm that computes fj, then we can find 
the social welfare of the optimal equilibrium in polynomial time. □ 

Lemma [7] proves Theorem [SJ2) and gives an upper bound to the complexity of 
computing optimal equilibrium. In the next lemma we show that this bound is 
tight, and that the social welfare of the optimal equilibrium cannot be approxi- 
mated, unless mean-payoff games are in P. 

Lemma 8 (Theorem [5](3)) There is no FPTAS that approximates the social 
welfare of an optimal equilibrium, unless mean-payoff games are in P. 

Proof. Due to Theorem [5] and Theorem H] it is enough to show that if we could 
approximate the social welfare of the optimal equilibrium in a three-player game, 
then we would be able to determine whether in a two-player zero-sum game, 
player 1 has a strategy that assures a value that is strictly greater than 0. The 
proof is straight forward. Let [Ax^A^) be the actions of a zero-sum two-player 
game with utility functions u% : A\ x A2 — » [—1, 1] and u% = —Ux- We construct 
a three-player game (A' ± , A' 2 ,A 3 , u'x,u 2 , u 3 ) in the following way: 

- A[ =AxU {$}, A' 2 = A 2 U {$} and A' 3 = {$} (where $ is a fresh action). 

— Let a 2 be an arbitrary action in A' 2 — {$}. We define the utility function 



The reader can verify that if player 1 has a strategy to assure utility greater than 
in the zero-sum game, then in any profile of equilibrium in the three-player 
game he will play $ only for a negligible number of rounds, and the social welfare 
of the equilibrium will be —1. On the other hand, if player 1 cannot assure utility 
at least 0, then a profile of strategies in which all three players play $ forever 
is an equilibrium and its social welfare is 1. Hence, even for e = 1 we cannot 
approximate the social welfare of the optimal equilibrium, unless mean-payoff 
games are in P. □ 



to be 




iti (ax, 02) if ax 7^ $ and 02 ^ $ 
iix(ax,a2) if ax 7^ $ and 02 = $ 
otherwise (if ax = $) 



We define u' 2 = —u[, and 




1 if ax = $ 

1 otherwise (if a\ ^ $) 



4.4 An FPTAS to compute an e-equilibrium that is ^-optimal 

In this subsection, we assume that the utilities of the players are scaled to ratio- 
nals in [—1,1], and we will describe an algorithm that computes an e-equilibrium 
that is (5-optimal (with respect to all e-equilibriums) and runs in time complex- 
ity that is polynomial in the input size and in ^ and |. Subsection 14.31 suggests 
that in order to compute a ^-optimal e-equilibrium we should approximate (by 
some value) the values of v\ , . . . , and then compute the optimal infinite path 
(with respect to the sum of utilities) that has utility for player % that is greater 
than the approximation of z/,-. However, this approach would not work, since the 
optimal social welfare is not a continuous function with respect to the values 

We denote by OPT e the social welfare of the optimal e-equilibrium. We base 
our solution on the next lemma, which gives two key properties of OPT f . 

Lemma 9 1. If e x > e 2 , then OPT ei > OPT t2 . 

2. For every a S [0, 1] and ei, ei > 0, let e = ae% + (1 — a)ti 7 then there exists 
an e-equilibrium with social welfare 

aOPT Cl + (1 - a)OPT t2 

Proof. The first item of the lemma is a trivial observation. In order to prove 
the second item, we observe that by Lemma 0] (and since by Proposition Q] it is 
enough to consider only pure equilibriums) it is enough to prove that there is 
an infinite path 7r with utility at least Vi — e in every dimension and with social 
welfare aOPT ei + (1 — a)OPT C2 . By Proposition [1] it is enough to show that 
there is a profile a of mixed strategies (that need not be an equilibrium) that 
has a utility at least v — e in every dimension and has a social welfare at least 
aOPT ei + (1 — a) OPT t2 . The construction of a is trivial. For i = 1, 2, let a ei be 
a profile of strategies that induces an e^-optimal equilibrium, then we construct 
a by playing according to a ei with probability a and playing according to a t2 
with probability 1 — a. □ 

Corollary 2 For every £ < |f we have 

OPT e+c --< OPT, < OPT e+c 

Proof. The fact that OPT \ < OPT c+c follows immediately from Lemma M,^)- 
To prove that OPT e+c - 6 < OPT ' e , we set a = § and by Lemma HK2), and 
since a( + (1 — a)(e + £) = e we get 

aOPT c + (1 - a)OPT t+c < OPT, 

since the utility function is scaled to [—1,1] we get that OPTq > —k and 
OPT e+ £ < k, and thus we have 



-2ka + OPT e+( < OPT e 



since a = ^ we get 



and since C < ff we have 

OPT e+c - 2 < 

□ 

By the above corollary, to approximate OPT,:, it is enough to approximate by 
| the value of OPT e+ ^ for some £ < || . For this purpose, we extend the notion 
of e-equilibrium also for /c-dimensional vectors, and we say that a profile of 
strategies is a /3-equilibrium if player i cannot improve its utility by at least Pi. 
Let us denote by min /3 and by max P the minimal and maximal element of P 
(respectively). Then by definition, 

OPT min p<OPTj<OPT ma ^ 

and by Corollary [2] we get that 

— —4k 

OPT j < OPT max p < OPTp + (max/? - min/3) • — (1) 

We are now ready to present an FPTAS that computes a (5-approximation 
for OPT t : (1) Set £ = and compute a Q approximation of v± for every player 
i, and denote it by rj. (2) Compute the optimal path n (with respect to social 
welfare) that has utility at least r, — (e — £) for every player, and return its social 
welfare. 

We note that we can execute the first step of the algorithm in polynomial 
time due to [^(Observation 3.1), and we can execute the second step in polyno- 
mial time by solving the linear programming problem that we described in the 
proof of Lemma [7] The next lemma proves the correctness of our approximation 
algorithm. 

Lemma 10 Let S(tt) denote the social welfare of it. Then S(n) — 5 < OPT t < 
S(tt) 

Proof. We denote player-i utility according to tt by Ui(jr) and we construct the 
vector j3 by defining 

p _ f Vi - Ui(w) if vi - Ui(7r) > (e - §) 
' \ e — | otherwise 

and we claim that 7r is an optimal /3-equilibrium. The claim holds because for 
every /3-equilibrium, the utility of player i is at least — (e — £), and 7r is the 
optimal path with utility at least i/j — (e— C) for every player. In addition, we claim 
that max fj — min /3 is at most i . The claim holds because by the construction 



-2k( 



OPT e+c < OPT e 



of /3 we have that min /3 > e — 5, and Vi — Ui(n) is at most e — C (since n is an 
(e — 0-equilibrium). Hence max/3 — min (3 < §. Therefore, since max /? < e and 
by Equation [TJ we get that 

S(n) < OPT, < S(n) + | • — 

and since C — |f we get that 

S(w) < OPT, < S(tt) + - 

and the assertion of the lemma follows. □ 

Lemma[TU] along with the complexity analysis that we provided, proves that there 
is an FPTAS to compute an e-equilibrium that is (5-optimal, and Theorem [5][4) 
follows. We also note that our proof for Theorem [5][4) gives a constructive (and 
polynomial) algorithm that computes a description of an actual e-equilibrium 
that is (5-optimal. 
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